On Flexible Allocation of Index and Temporary Data in Parallel Database Systems

نویسندگان

  • Erhard Rahm
  • Holger Märtens
  • Thomas Stöhr
چکیده

Data placement is a key factor for high performance database systems. This is particularly true for parallel database systems where data allocation must support both I/O parallelism and processing parallelism within complex queries and between independent queries and transactions. Determining an effective data placement is a complex administration problem depending on many parameters including system architecture, database and workload characteristics, hardware configuration, etc. Research and tool support has so far concentrated on data placement for base tables, especially for Shared Nothing (SN), e.g. [MD97]. On the other hand, to our knowledge, data placement issues for architectures where multiple DBMS instances share access to the same disks (Shared Disk, Shared Everything, specific hybrid architectures) have not yet been investigated in a systematic way. Furthermore, little work has been published on effective disk allocation of index structures and temporary data (e.g., intermediate query results). However, these allocation problems gain increasing importance, e.g. in order to effectively utilize parallel database systems for decision support / data warehousing environments. In the next section we discuss the index allocation problem in more detail and introduce a classification of various approaches that are already supported to some degree in commercial DBMS. While SN offers only few options, the other architectures provide a higher flexibility because index allocation can be independent from the base table allocation. For certain indexsupported queries, this can allow for order-of-magnitude savings in I/O and communication cost. We then turn to the disk allocation of intermediate query results for which the allocation parameters can be chosen dynamically at query run time. For the case of parallel hash joins, we outline how to determine an optimal approach supporting a high degree of parallelism. The work discussed is performed within a project aiming at developing strategies to automatically determine optimal data allocation strategies in order to simplify system administration in high performance environments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Disk Allocation of Intermediate Query Results in Parallel Database Systems

For complex queries in parallel database systems, substantial amounts of data must be redistributed between operators executed on different processing nodes. Frequently, such intermediate results cannot be held in main memory and must be stored on disk. To limit the ensuing performance penalty, a data allocation must be found that supports parallel I/O to the greatest possible extent. In this p...

متن کامل

Developing a method for reliability allocation of series-parallel systems by considering common cause failure

Reliability allocation has an essential connection to design for reliability and is an important activity in the product design and development process. In determining the reliability of subsystems or components on the basis of goal reliability, attention must be paid to failure effect, failure information, and improvement opportunities based upon real potentials for reliability improvement. In...

متن کامل

Set a bi-objective redundancy allocation model to optimize the reliability and cost of the Series-parallel systems using NSGA II ‎problem‎

With the huge global and wide range of attention placed upon quality, promoting and optimize the reliability of the products during the design process has turned out to be a high priority. In this study, the researcher have adopted one of the existing models in the reliability science and propose a bi-objective model for redundancy allocation in the series-parallel systems in accordance with th...

متن کامل

Static Task Allocation in Distributed Systems Using Parallel Genetic Algorithm

Over the past two decades, PC speeds have increased from a few instructions per second to several million instructions per second. The tremendous speed of today's networks as well as the increasing need for high-performance systems has made researchers interested in parallel and distributed computing. The rapid growth of distributed systems has led to a variety of problems. Task allocation is a...

متن کامل

Reliability Modelling of the Redundancy Allocation Problem in the Series-parallel Systems and Determining the System Optimal Parameters

Considering the increasingly high attention to quality, promoting the reliability of products during designing process has gained significant importance. In this study, we consider one of the current models of the reliability science and propose a non-linear programming model for redundancy allocation in the series-parallel systems according to the redundancy strategy and considering the assump...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999